Open Deep Research

A Flexible, Configurable Research Agent

Built on LangGraph - Simple and configurable
Users can bring their own models, search tools, and MCP servers
Open source implementation available on GitHub
Try it out on Open Agent Platform

TL;DR

Deep Research as an Agent Application

Deep research has become one of the most popular agent applications
Major players with deep research products:
- OpenAI
- Anthropic
- Perplexity
- Google
All produce comprehensive reports using various sources of context
Many open source implementations available

We've built an open deep researcher that is simple and configurable, allowing users to bring their own models, search tools, and MCP servers.

Challenge

Research as an Open-Ended Task

Research is an open-ended task; the best strategy to answer a user request can't be easily known in advance.

"Compare these two products"

Comparisons benefit from searching each product, followed by synthesis.

"Find the top 20 candidates for this role"

Listing/ranking requires open-ended search, synthesis, and ranking.

"Is X really true?"

Validation questions require iterative deep research where source quality matters more than breadth.

Key Design Principle: Flexibility to explore different research strategies depending on the request.

Architectural Overview

Three-Step Research Process

Agents are well suited to research because they can flexibly apply different strategies, using intermediate results to guide exploration.

Scope – clarify research scope
Research – perform research
Write – produce the final report

Phase 1: Scope

Gather User Context for Research

User Clarification

Users rarely provide sufficient context in a research request. We use a chat model to ask for additional context if necessary.

Brief Generation

We translate the verbose chat interaction into a comprehensive, yet focused research brief that serves as our north star for success.

We translate the researcher-user chat interaction into a focused brief for the research supervisor to measure against.

Phase 2: Research

Gather Context Using a Supervisor Agent

Research Supervisor

The supervisor delegates research tasks to an appropriate number of sub-agents. It determines if the research brief can be broken down into independent sub-topics and delegates to sub-agents with isolated context windows.

Research Sub-Agents

Each sub-agent focuses on a specific topic and conducts research as a tool-calling loop, using search tools and/or MCP tools configured by the user.

We make an additional LLM call to clean sub-agent research findings so that the supervisor is provided with clean, processed information.

Phase 3: Report Writing

Produce the Final Report

The goal of report writing is to fulfill the request in the research brief using the gathered context from sub-agents.

When the supervisor deems that the gathered findings are sufficient to address the request in the research brief, we move ahead to write the report.

To write the report, we provide an LLM with the research brief and all of the research findings returned by sub-agents. This final LLM call produces an output in one-shot, steered by the brief and answered with the research findings.

Lessons

Multi-Agent Considerations

Only use multi-agent for easily parallelized tasks

Multi vs. single-agent is an important design consideration. Cognition has argued against multi-agent because sub-agents working in parallel can be difficult to coordinate.

Earlier versions of our research agent wrote sections of the final report in parallel with sub-agents. It was fast, but the reports were disjoint because the section-writing agents were not well coordinated.

Multi-agents are hard to coordinate, and can perform poorly if writing sections of the report in parallel. We restrict multi-agent to research, and write the report in one-shot.

Lessons

Context Isolation Benefits

Multi-agent is useful for isolating context across sub-research topics

Single agent response quality suffers if the request has multiple sub-topics. The intuition is straightforward: a single context window needs to store and reason about tool feedback across all sub-topics.

Compare the approaches of OpenAI vs Anthropic vs Google DeepMind to AI safety. I want to understand their different philosophical frameworks, research priorities, and how they're thinking about the alignment problem.

Our single agent implementation used its search tool to send separate queries about each frontier lab at the same time, but had to juggle context from three independent threads.

Context isolation of sub-topics during research can avoid various long context failure modes.

Lessons

Supervisor Flexibility

Multi-agent supervisor enables the system to tune to required research depth

Users do not want simple requests to take 10+ minutes. But, there are some requests that require research with higher token utilization and latency.

The supervisor can handle both cases by selectively spawning sub-agents to tune the level of research depth needed for a request. The supervisor is prompted with heuristics to reason about when research should be parallelized, and when a single thread of research is sufficient.

A multi-agent supervisor allows for flexibility of search strategy.

Lessons

Context Engineering

Context Engineering is important to mitigate token bloat and steer behavior

Research is a token-heavy task. Anthropic reported that their multi-agent system used 15x more tokens than a typical chat application!

We used context engineering to mitigate this:

We compress the chat history into a research brief
Sub-agents prune their research findings to remove irrelevant tokens

Without sufficient context engineering, our agent was prone to running into context window limits from long, raw tool-call results.

Context engineering has many practical benefits. It saves tokens, helps avoid context window limits, and helps stay under model rate limits.

Next Steps

Future Improvements

Open deep research is a living project and we have several ideas we want to try. These are some of the open questions that we're thinking about:

What is the best way to handle token-heavy tool responses, and what is the best way to filter out irrelevant context to reduce unnecessary token expenditure?
Are there any evaluations worth running in the hot path of the agent to ensure high quality responses?
Deep research reports are valuable and relatively expensive to create, can we store this work and leverage these in the future with long-term memory?

Using Open Deep Research

Get Started Today

LangGraph Studio

You can clone our LangGraph code and run Open Deep Research locally with LangGraph Studio.

You can use Studio to test out the prompts and architecture and tailor them more specifically to your use cases!

View on GitHub

Open Agent Platform

We've hosted Open Deep Research on our demo instance of Open Agent Platform (OAP).

OAP is a citizen developer platform, allowing users to build, prototype, and use agents - all you need to do is pass in your API keys.

Try it on OAP